Overview

Dataset Statistics

Number of Variables 9
Number of Rows 1.7754e+07
Missing Cells 6.5947e+06
Missing Cells (%) 4.1%
Duplicate Rows 5005
Duplicate Rows (%) 0.0%
Total Size in Memory 6.6 GB
Average Row Size in Memory 399.0 B
Variable Types
  • Categorical: 5
  • Numerical: 4

Dataset Insights

category_code has 4250023 (23.94%) missing values Missing
brand has 2344678 (13.21%) missing values Missing
product_id is skewed Skewed
category_id is skewed Skewed
price is skewed Skewed
user_id is skewed Skewed
event_time has a high cardinality: 6084922 distinct values High Cardinality
category_code has a high cardinality: 135 distinct values High Cardinality
brand has a high cardinality: 4622 distinct values High Cardinality
user_session has a high cardinality: 11445609 distinct values High Cardinality
event_time has constant length 29 Constant Length
user_session has constant length 36 Constant Length
  • 1
  • 2

Variables


event_time

categorical

Approximate Distinct Count 6084922
Approximate Unique (%) 34.3%
Missing 0
Missing (%) 0.0%
Memory Size 1.6 GB

Length

Mean 29
Standard Deviation 0
Median 29
Minimum 29
Maximum 29

Sample

1st row 2019-11-30T18:00:0...
2nd row 2019-11-30T18:00:1...
3rd row 2019-11-30T18:00:1...
4th row 2019-11-30T18:00:1...
5th row 2019-11-30T18:00:2...

Letter

Count 17753855
Lowercase Letter 0
Space Separator 0
Uppercase Letter 17753855
Dash Punctuation 53261565
Decimal Number 372830955
  • event_time contains many words: 6084922 words
  • event_time has words of constant length

event_type

categorical

Approximate Distinct Count 3
Approximate Unique (%) 0.0%
Missing 0
Missing (%) 0.0%
Memory Size 1.1 GB
  • The largest value (view) is over 22.79 times larger than the second largest value (cart)

Length

Mean 4.0636
Standard Deviation 0.5004
Median 4
Minimum 4
Maximum 8

Sample

1st row view
2nd row view
3rd row view
4th row view
5th row view

Letter

Count 72144856
Lowercase Letter 72144856
Space Separator 0
Uppercase Letter 0
Dash Punctuation 0
Decimal Number 0
  • The top 2 categories (view, cart) take over 50.0%
  • The largest value (view) is over 22.79 times larger than the second largest value (cart)

product_id

numerical

Approximate Distinct Count 201817
Approximate Unique (%) 1.1%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Memory Size 270.9 MB
Mean 1.3981e+07
Minimum 1000365
Maximum 100064485
Zeros 0
Zeros (%) 0.0%
Negatives 0
Negatives (%) 0.0%
  • product_id is skewed right (γ1 = 2.8352)

Quantile Statistics

Minimum 1000365
5-th Percentile 1.0044e+06
Q1 1.2013e+06
Median 5.1008e+06
Q3 1.8001e+07
95-th Percentile 5.49e+07
Maximum 100064485
Range 99064120
IQR 1.68e+07

Descriptive Statistics

Mean 1.3981e+07
Standard Deviation 2.0951e+07
Variance 4.3893e+14
Sum 2.4822e+14
Skewness 2.8352
Kurtosis 8.5927
Coefficient of Variation 1.4985
  • product_id is not normally distributed (p-value 2.8460030832266495e-21)
  • product_id has 1135700 outliers

category_id

numerical

Approximate Distinct Count 1205
Approximate Unique (%) 0.0%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Memory Size 270.9 MB
Mean -3.4347e+11
Minimum 2053013551865397438
Maximum 2232732138325672372
Zeros 0
Zeros (%) 0.0%
Negatives 0
Negatives (%) 0.0%
  • category_id is skewed right (γ1 = 0.9689)

Quantile Statistics

Minimum 2053013551865397438
5-th Percentile 2.053e+18
Q1 2.053e+18
Median 2.053e+18
Q3 2.2327e+18
95-th Percentile 2.2327e+18
Maximum 2232732138325672372
Range 179718586460274934
IQR 1.7972e+17

Descriptive Statistics

Mean -3.4347e+11
Standard Deviation 7.9037e+16
Variance 6.2469e+33
Sum -6.098e+18
Skewness 0.9689
Kurtosis -1.0083
Coefficient of Variation -230111.2743
  • category_id is not normally distributed (p-value 1.7260997329535478e-22)

category_code

categorical

Approximate Distinct Count 135
Approximate Unique (%) 0.0%
Missing 4250023
Missing (%) 23.9%
Memory Size 1.1 GB
  • The largest value (electronics.smartphone) is over 1.73 times larger than the second largest value (construction.tools.light)

Length

Mean 22.3955
Standard Deviation 5.2564
Median 24
Minimum 9
Maximum 38

Sample

1st row construction.tools...
2nd row apparel.costume
3rd row construction.tools...
4th row electronics.audio....
5th row computers.peripher...

Letter

Count 280854816
Lowercase Letter 280854816
Space Separator 0
Uppercase Letter 0
Dash Punctuation 0
Decimal Number 0
  • The largest value (electronicssmartphone) is over 1.73 times larger than the second largest value (constructiontoolslight)

brand

categorical

Approximate Distinct Count 4622
Approximate Unique (%) 0.0%
Missing 2344678
Missing (%) 13.2%
Memory Size 1.0 GB

Length

Mean 5.977
Standard Deviation 1.7176
Median 6
Minimum 2
Maximum 43

Sample

1st row force
2nd row xiaomi
3rd row missha
4th row huawei
5th row awei

Letter

Count 91949561
Lowercase Letter 91949561
Space Separator 0
Uppercase Letter 0
Dash Punctuation 135954
Decimal Number 0
  • brand contains many words: 4599 words

price

numerical

Approximate Distinct Count 81169
Approximate Unique (%) 0.5%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Memory Size 270.9 MB
Mean 285.1746
Minimum 0
Maximum 2574.07
Zeros 31977
Zeros (%) 0.2%
Negatives 0
Negatives (%) 0.0%
  • price is skewed right (γ1 = 2.6338)

Quantile Statistics

Minimum 0
5-th Percentile 18.9296
Q1 66.67
Median 163.87
Q3 355.19
95-th Percentile 1013.18
Maximum 2574.07
Range 2574.07
IQR 288.52

Descriptive Statistics

Mean 285.1746
Standard Deviation 353.3986
Variance 124890.5931
Sum 5.0629e+09
Skewness 2.6338
Kurtosis 8.9676
Coefficient of Variation 1.2392
  • price is not normally distributed (p-value 4.055066193272417e-12)
  • price has 1549465 outliers

user_id

numerical

Approximate Distinct Count 3855594
Approximate Unique (%) 21.7%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Memory Size 270.9 MB
Mean 5.4104e+08
Minimum 29515875
Maximum 595414563
Zeros 0
Zeros (%) 0.0%
Negatives 0
Negatives (%) 0.0%
  • user_id is skewed right (γ1 = 0.0763)

Quantile Statistics

Minimum 29515875
5-th Percentile 5.129e+08
Q1 5.1714e+08
Median 5.3854e+08
Q3 5.6216e+08
95-th Percentile 5.8526e+08
Maximum 595414563
Range 565898688
IQR 4.5015e+07

Descriptive Statistics

Mean 5.4104e+08
Standard Deviation 2.5052e+07
Variance 6.276e+14
Sum 9.6055e+15
Skewness 0.07633
Kurtosis 2.1089
Coefficient of Variation 0.0463
  • user_id is not normally distributed (p-value 2.4079506366252363e-10)
  • user_id has 13272 outliers

user_session

categorical

Approximate Distinct Count 11445609
Approximate Unique (%) 64.5%
Missing 2
Missing (%) 0.0%
Memory Size 1.7 GB

Length

Mean 36
Standard Deviation 0
Median 36
Minimum 36
Maximum 36

Sample

1st row de33debe-c7bf-44e8...
2nd row 370e8c88-3d07-41df...
3rd row 7dacfa36-da9e-4282...
4th row 3de3ac21-f446-4cf5...
5th row e65a6eb9-20ea-4991...

Letter

Count 208674268
Lowercase Letter 208674268
Space Separator 0
Uppercase Letter 0
Dash Punctuation 71015412
Decimal Number 359449028
  • user_session contains many words: 11445609 words
  • user_session has words of constant length

Interactions

Correlations

Missing Values